{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    ">### 🚩 *Create a free WhyLabs account to complete this example!*<br> \n",
    ">*Did you know you can store, visualize, and monitor whylogs profiles with the [WhyLabs Observability Platform](https://whylabs.ai/whylabs-free-sign-up?utm_source=github&utm_medium=referral&utm_campaign=Local_Models)? Sign up for a [free WhyLabs account](https://whylabs.ai/whylogs-free-signup?utm_source=github&utm_medium=referral&utm_campaign=Local_Models) to leverage the power of whylogs and WhyLabs together!*"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Using Langkit with Local Models\n",
    "\n",
    "[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/whylabs/LanguageToolkit/blob/main/langkit/examples/Local_Models.ipynb)\n",
    "\n",
    "Some of the Langkit modules download models from the internet. This is not always possible, for example, when running in an environment without internet access. In this example, we will show how you can use Langkit with models stored locally."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's start by installing LangKit:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%pip install 'langkit[all]>=0.0.32'"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We're also assuming the existence of local models in specific folders, such as when downloading the models with the script below.\n",
    "\n",
    "Make sure you have git-lfs installed. If not, you can install it by running:\n",
    "\n",
    "`sudo apt-get install git-lfs`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Cloning into 'local-toxicity-model'...\n",
      "remote: Enumerating objects: 40, done.\u001b[K\n",
      "remote: Total 40 (delta 0), reused 0 (delta 0), pack-reused 40\u001b[K\n",
      "Unpacking objects: 100% (40/40), 301.27 KiB | 414.00 KiB/s, done.\n",
      "Cloning into 'local-sentence-transformers'...\n",
      "remote: Enumerating objects: 49, done.\u001b[K\n",
      "remote: Counting objects: 100% (3/3), done.\u001b[K\n",
      "remote: Compressing objects: 100% (3/3), done.\u001b[K\n",
      "remote: Total 49 (delta 0), reused 0 (delta 0), pack-reused 46\u001b[K\n",
      "Unpacking objects: 100% (49/49), 316.57 KiB | 311.00 KiB/s, done.\n",
      "Filtering content: 100% (3/3), 260.15 MiB | 16.47 MiB/s, done.\n"
     ]
    }
   ],
   "source": [
    "!git clone https://huggingface.co/martin-ha/toxic-comment-model local-toxicity-model\n",
    "!git clone https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2 local-sentence-transformers"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The  `martin-ha/toxic-comment-model` is the model currently used in `toxicity`, and `sentence-transformers/all-MiniLM-L6-v2` is used to generate embeddings in both `themes` and `input_output_modules`. We can pass the local paths when initializing the modules:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "from langkit import themes\n",
    "from langkit import toxicity\n",
    "from langkit import input_output\n",
    "\n",
    "from langkit import LangKitConfig\n",
    "\n",
    "local_config = LangKitConfig(toxicity_model_path=\"local-toxicity-model\",\n",
    "              transformer_name=\"local-sentence-transformers\")\n",
    "\n",
    "toxicity.init(config=local_config)\n",
    "themes.init(config=local_config)\n",
    "input_output.init(config=local_config)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "If, for example, we want a local version for the `llm_metrics` module, we also need to import `textstat`, `regexes`, and `sentiment`. `regexes` and `textstat` are lightweight models and don't require external artifacts, so we can use them in a network restricted environment. `sentiment`, however, downloads artifacts from the internet, so let's replace it with `vader_sentiment`, which will yield the same results as `sentiment`, with the benefit of not requiring downloading artifacts at runtime."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [],
   "source": [
    "from langkit import regexes\n",
    "from langkit import vader_sentiment\n",
    "from langkit import textstat"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now, we should have an equivalent version of `llm_metrics` that doesn't require internet access. Let's check the results for a toy example:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'prompt': 'I like you. I love you',\n",
       " 'response': 'thanks!',\n",
       " 'prompt.jailbreak_similarity': 0.2522321939468384,\n",
       " 'response.refusal_similarity': 0.1535428911447525,\n",
       " 'prompt.toxicity': 0.006519913673400879,\n",
       " 'response.toxicity': 0.0011597275733947754,\n",
       " 'response.relevance_to_prompt': 0.23008441925048828,\n",
       " 'prompt.has_patterns': None,\n",
       " 'response.has_patterns': None,\n",
       " 'prompt.vader_sentiment': 0.7717,\n",
       " 'response.vader_sentiment': 0.4926,\n",
       " 'prompt.flesch_reading_ease': 119.19,\n",
       " 'response.flesch_reading_ease': 121.22,\n",
       " 'prompt.automated_readability_index': -6.7,\n",
       " 'response.automated_readability_index': 12.0,\n",
       " 'prompt.aggregate_reading_level': 1.0,\n",
       " 'response.aggregate_reading_level': 0.0,\n",
       " 'prompt.syllable_count': 6,\n",
       " 'response.syllable_count': 1,\n",
       " 'prompt.lexicon_count': 6,\n",
       " 'response.lexicon_count': 1,\n",
       " 'prompt.sentence_count': 2,\n",
       " 'response.sentence_count': 1,\n",
       " 'prompt.character_count': 17,\n",
       " 'response.character_count': 7,\n",
       " 'prompt.letter_count': 16,\n",
       " 'response.letter_count': 6,\n",
       " 'prompt.polysyllable_count': 0,\n",
       " 'response.polysyllable_count': 0,\n",
       " 'prompt.monosyllable_count': 6,\n",
       " 'response.monosyllable_count': 1,\n",
       " 'prompt.difficult_words': 0,\n",
       " 'response.difficult_words': 0}"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from whylogs.experimental.core.udf_schema import udf_schema\n",
    "from langkit import extract\n",
    "\n",
    "text_schema = udf_schema()\n",
    "result = extract({\"prompt\":\"I like you. I love you\",\"response\":\"thanks!\"},schema=text_schema)\n",
    "\n",
    "result"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "langkit-rNdo63Yk-py3.8",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.10"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
